A pediatric wrist trauma X-ray dataset (GRAZPEDWRI-DX) for machine learning

Digital radiography is widely available and the standard modality in trauma imaging, often enabling to diagnose pediatric wrist fractures. However, image interpretation requires time-consuming specialized training. Due to astonishing progress in computer vision algorithms, automated fracture detection has become a topic of research interest. This paper presents the GRAZPEDWRI-DX dataset containing annotated pediatric trauma wrist radiographs of 6,091 patients, treated at the Department for Pediatric Surgery of the University Hospital Graz between 2008 and 2018. A total number of 10,643 studies (20,327 images) are made available, typically covering posteroanterior and lateral projections. The dataset is annotated with 74,459 image tags and features 67,771 labeled objects. We de-identified all radiographs and converted the DICOM pixel data to 16-Bit grayscale PNG images. The filenames and the accompanying text files provide basic patient information (age, sex). Several pediatric radiologists annotated dataset images by placing lines, bounding boxes, or polygons to mark pathologies like fractures or periosteal reactions. They also tagged general image characteristics. This dataset is publicly available to encourage computer vision research.


Methods
We constructed the GRAZPEDWRI-DX dataset from image data (pediatric wrist radiographs), natural language (report texts), and human expert annotations (bounding boxes, lines, polygons, and image tags). Annotations were established directly based onto the X-ray contents and with the aid of the corresponding free text reports, finally combined to the whole dataset. The whole process of dataset construction is presented as a flowchart in Fig. 1.
The project (No. EK 31-108 ex 18/19) was approved by the local ethics committee of the Medical University of Graz (IRB00002556). Requirement for oral or written patient consent was waived due to the retrospective project design. All study-related methods were conducted in accordance with the Declaration of Helsinki and all relevant guidelines and regulations.
Pediatric wrist radiographs. 10,643 wrist radiography studies of 6,091 unique pediatric patients (mean age 10.9 years, range 0.2 to 19 years; 2,688 females, 3,402 males, 1 unknown) were retrieved as Digital Imaging and Communications in Medicine (DICOM) images from the local Picture Archiving and Communication System (PACS). DICOM is the standard medical image format which enables permanent storage and exchange between different modalities and institutions 26,27 . Apart from the pixel information, DICOM images contain meta data (DICOM header) that rely on tight rules. Some of the meta data are mandatory, while others are optional. The DICOM standard is updated regularly 28 . Since there are still some barriers with the use of DICOM, we decided to convert the images to the more widespread Portable Network Graphics (PNG) format. PNG images offer a lossless compression of the pixel data and can be stored in more than 8 Bit grayscale format. Radiography DICOMs typically contain 12-bit, sometimes 16-bit of grayscale information. DICOM images served as input and were read with the "pydicom" package 29 . The PNG format does not allow to store 12 bits per channel, so we normalized the grayscale values of all DICOMs to 16-bit, stretching the grayscale histogram from initially 0 to 4,095 (12-bit) to 0 to 65,535 (16-bit) with the "cv2" module. Afterwards, PNGs were saved to the hard disk. anonymization procedure. De-identification and image conversion procedure was performed in batch by a custom-coded Python script. PNGs do not allow to store meta data as a standard feature. To retain sufficient information for processing and analysis of the PNG images, we created the file names based onto DICOM header information. The first part of the filename was constructed from concatenated strings of "Blake2b" 30 cryptographic hashes of "Institution" and "Patient ID", afterwards hashed with "SHA-3-256" 31 . The resulting hash is the same for each individual patient. The second part of the file name consisted of the acquisition time's Unix timestamp, subtracted by a 9-integer long hash of the "Patient ID", keeping the original study intervals while irreversibly masking the original examination times. The series number and image number (zero-padded to two digits) represented the next part of the filename, which was followed by a region token, the laterality and projection, and finally sex and age (rounded to one decimal place). An example file name would be:   www.nature.com/scientificdata www.nature.com/scientificdata/ DIcoM layers and overlays. Any additional layers or overlays in the original DICOM were irreversibly discarded during the image conversion and de-identification process.

Burned in annotations.
After DICOM conversion including removal of the header information, PNG images can still contain identifying text in their pixel matrix, usually referred to as "Burned In Annotation". These text passages, written into the image pixels can contain sensitive information about the patient, the radiological technologists or even the referring clinician. We found our DICOM headers routinely inaccurate regarding the presence of the "Burned In Annotation" attribute. Automated text-recognition methods based on the Tesseract optical character recognition software were not reliable enough to enable unvalidated masking of potentially sensitive information. We therefore screened all images manually and masked related images with black boxes in IrfanView (n = 97). In contrast to other publicly available datasets, we also masked radiographer abbreviations.
Image labeling. Board-certified pediatric radiologists (S.T., E.N., and E.S.) with experiences between 6 and 29 years in musculoskeletal radiology validated all images annotations performed on the Supervisely (Deep Systems LLC, Moscow, Russia) artificial intelligence online platform. A dedicated server hosted the web-based image database, allowing for collaborate labeling. Clients accessed the server with a web browser over the Internet. Apart from the mentioned validating radiologists, local radiologists, visiting colleagues, and medical students helped to progress the dataset with different shares of labeling times. All annotations were executed between March 2018 and February 2022.
The annotators manually placed objects with dedicated tools. Amongst others, we annotated fractures, metal implants, periosteal reactions, or bone lesions with bounding boxes, polygons, or lines. Image tags were manually set to represent features of each image, when appropriate. Collected objects (Fig. 3.) and image tags are listed in Tables 1 and 2. Pediatric radiology reports. All available radiology reports were manually read by the authors and classified either into fracture or no fracture. They served as basis for labeling fractures in the annotation period. The German natural language reports are not delivered together with the dataset as 1) we were not able assure full anonymity, 2) they might not be correctly linked to the corresponding image occasionally due to pooling of images of different date and/or body regions into single free text reports. However, we are willing to provide the reports for research purposes on request.

Data Records
Data are provided for download on Figshare (https://doi.org/10.6084/m9.figshare.14825193) 32 . We grant free access to the dataset, without the need for user registration. The dataset is distributed in ZIP archives with a total size of 15.2 Gigabytes (GB), containing its original folder structure displayed below.
Folder structure. We provide ready-to-use data in proprietary Supervisely project ("supervisely" folder) format, in PASCAL Visual Object Classes (VOC) 33 format ("pascalvoc" folder), in YOLOv5 34 format ("yolov5" folder), as well as image tags in comma separated values (CSV) format. The whole project structure is shared by a ZIP file.
The root folder contains a CSV file ("dataset.csv") with all filenames, basic patient information and image tags. The "notebooks" folder holds three pieces of code, one for previewing annotations, and one for splitting files based on two columns of a CSV file, and one image post-processing. They are released as Jupyter notebooks. The "images" folder contains all wrist radiographs in 16-bit PNG format. The "supervisely" folder has a single "meta.json" file and is structured into a "wrist" subfolder that consists of an "ann" subfolder with json annotations, and an empty "img" subfolder. This "img" subfolder needs to be populated from the "images" folder, if needed. The "pascalvoc" folder contains Extensible Markup Language (XML) data files for object detection. This folder also does not include image files, which need to be copied from the "images" folder, if necessary. Every   Fig. 2 Meaning of the individual file name parts. File names for images and annotations start with a four-digit patient number, followed by a hashed examination time, the ascending study number, the region code (all "WRI"), the laterality and projection code, and finally sex and age in years.
www.nature.com/scientificdata www.nature.com/scientificdata/ annotation file is associated with a corresponding image file, featuring the same file basename and only variable file extension. The "yolov5" folder contains the labels as text files (TXT) in the dedicated YOLOv5 format. It comes with a "meta.yaml" file for basic settings and object mappings.
An annotated example study is provided in Fig. 4. www.nature.com/scientificdata www.nature.com/scientificdata/ A total number of 67,771 objects were annotated in the dataset, splitting into the different categories as specifically detailed in Table 1. The "axis object consist of a two-point line along the main axis of the forearm bones. It is thought to be of help in automatically aligning the images. Bone anomalies ("boneanomaly") are a heterogenous group of objects, representing bone pathologies from drill holes to Madelung's deformities. Bone lesions ("bonelesion") are bone tumors like osteomas. "foreignbody" refers to objects that label foreign bodies visible onto the image. The "fracture" object is used to annotate fractures. The "metal" object is used for internal or external metal implants. The annotators labeled periosteal reactions and callus under the label www.nature.com/scientificdata www.nature.com/scientificdata/ "periostealreaction" with polygons. The "pronatorsign" annotation resembles the specific radiological sign in terms of a swelling in the Musculus pronator quadratus region 35,36 . "softtissue" refers to an unspecific soft tissue swelling, usually due to trauma. "text" marks any text passage or single letters visible, most commonly laterality indicators "L" and "R".
The annotators set 74,459 image tags in the 20,327 images. They included AO/OTA ("Arbeitsgemeinschaft für Osteosynthesefragen" /"Orthopaedic Trauma Association") classifications (2018 version) 37 in images with acute or subacute fractures, but not in healed or remodeled fractures. Images might contain AO classifications without a labeled fracture object (bounding box) in cases, where a fracture is known to be present but not clearly visualized in the respective projection. Table 2

technical Validation
Image data is of high quality, delivered in full pixel resolution and complete grayscale spectrum. We did not apply any post-processing apart from histogram normalization (stretching) the usually 12-bit input spectrum to 16-bit due to meet PNG image format requirements. Note that there is no loss of information due this reversible procedure (Fig. 5.). Postprocessing is advised for typical computer vision tasks. We attached a script to optimize image contrast and convert the gray levels to 8 bits ("image_conversion.ipynb").
All studies and their annotations were reviewed by experienced pediatric radiologists at least twice. Still, users need to consider inevitable inaccuracies, discrepancies, and errors in labeling due to the restricted diagnostic sensitivity of X-ray studies 38,39 . Images in question are tagged "diagnosis_uncertain" (n = 537, 2.64%). However, it must be kept in mind that annotation errors might not be confined to these images, but could be present within the whole dataset. Particularly AO classification codes are not always unambiguous. Images that are considered normal might contain occult fractures, while we believe that studies rated to be pathologic, are accurately tagged in the majority of cases.
We assessed "fracture" labeling agreement between the validating pediatric radiologists with intersection over union (IoU) measurements in a subsample of 100 randomly chosen overlapping images, resulting in a mean IoU 0.70 (1 st percentile 0.22, 25 th percentile 0.63, 50 th percentile = median 0.73, 75 th percentile 0.80, 99 th percentile 0.94 for the blinded raters. An IoU of 1 resembles perfectly overlapping bounding boxes, while 0 would be no overlap at all. Thus, inter-rater bounding box agreement was moderate. However, the whole dataset was labeled in an iterative continuous fashion, where labels were little-by-little adjusted multiple times by different annotators. So labeling quality is considered more accurate than the IoU level alone might indicate. Object detection performance. We trained a state-of-the-art neural network for object detection, namely YOLOv5 34 by randomly dividing the total dataset into a training set of 15,327 (of 20,327), and a validation set of 4,000 images. A Windows PC equipped with an Nvidia GeForce RTX 2060 SUPER (video memory size of 8,192 MB) ran 50 epochs of a COCO 40 pre-trained "YOLOv5m" model with an input size of 640 pixels and a batch size of 16 samples. Standard hyperparameters were used. Python version was 3.9.5. Table 3 lists the results (precision, recall, mean average precision = mAP) achieved on the previously unseen test subset of 1,000 random samples. Fractures were detected with a precision of 0.917, a recall of 0.887, and a mAP of 0.933 at an IoU threshold of 0.5.

Usage Notes
The dataset is made freely available for any purpose. The data provided within this work are free to copy, share or redistribute in any medium or format. The data might be adapted, remixed, transformed, and built upon. The dataset is licensed under a Creative Commons "Attribution 4.  www.nature.com/scientificdata www.nature.com/scientificdata/ (https://creativecommons.org/licenses/by/4.0/). Users are prohibited to attempt re-identification of personal patient information.
Correct use of the dataset requires medical and radiological background knowledge, especially in interpreting obtained results and in drawing dataset-based conclusions. Potential labeling errors must be taken into consideration, as pointed out in the "Technical Validation" section. Our key aim in releasing this dataset is to encourage research in automated pediatric fracture detection. We believe that releasing the GRAZPEDWRI-DX dataset could improve research in this field.
Computer vision algorithms commonly rely on input images with 8 bits per channel for training and inference, particularly when using transfer learning bases on large image archives. The dataset contains a Python script that enables image conversion from the 16-bit source images to 8-bit samples ("image_conversion.ipynb"). It incorporates options to apply sharpening and contrast enhancing in terms of intensity rescaling and contrast limited adaptive histogram equalization (CLAHE), enabled by the Python "scikit-image (skimage)" package 41 . www.nature.com/scientificdata www.nature.com/scientificdata/ The GRAZPEDWRI-DX dataset enables assessment of a variety of research questions around the injured pediatric wrist. To our knowledge, there are no related pediatric datasets publicly available. In adults, there is a limited number of musculoskeletal radiography collections available to the community. The Stanford University datasets MURA 25 and LERA 42 contain large numbers of samples but feature only binary labels, in terms of "normal" and "abnormal". The MURA dataset consists of 14,863 studies (total of 40,561 multi-view radiographic images) from 12,173 patients 25 . The LERA dataset accumulated lower extremity radiographs of 182 patients 42 .
Both lack comprehensive labeling that we provide within the current dataset.
We might release further revisions of the dataset annotations in the future.

code availability
All anonymization steps were computed in Python 3.8.2 on a Windows 8.1 platform as described in the Methods section. We are not able to publicly share the actual code involved in the de-identification procedure, as patient information was processed. The de-identification procedure should be sufficiently reproducible based onto the presented information.  Table 3. Results of baseline object testing with a pre-trained YOLOv5 model.

Fig. 5
Histograms of 6 sample images showing their grayscale spectra. Note that the full grayscale range was preserved when converting and normalizing the images. The relative number of pixels per grayscale level are normalized from 0 to 1 for display purposes.